Introducing the Concept of Decisional DNA-Based Web Content Mining
نویسندگان
چکیده
This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. With very fast expansion of the Internet, several problems and challenges are created by the persistent growth of the amount of information in Web content. These challenges are related mainly to the difficulty of extracting potentially useful information and knowledge from Internet pages. To support Web content knowledge capture we propose a concept that integrates a novel decisional DNA knowledge structure with the traditional Web crawler technologies. Decisional DNA as a knowledge representation platform can be used to deal with noisy and incomplete data and can help to learn from experience and make precise decisions and predictions in vague and fuzzy environments. We illustrate our proposed concept with a set of experiments to prove its initial efficiency and effectiveness. INTRODUCTION As one of the main sources of information, the Internet plays a very important role in daily life. In fact, the Internet is a massive information and knowledge resource but it also contains a huge amount of data that consists of useless and not needed spam (Azam et al. 2010). Thus, it becomes increasingly important for users to be able to apply appropriate techniques that can help finding the information that is truly desired. On this basis, Web data mining has become one of the most dynamically expanding research areas in the domain of high technology (Azam et al. 2010). This article introduces a novel and explicit Web searching approach that integrates the traditional technique of a Web crawler with the decisional DNA (DDNA) knowledge structure introduced by Sanin and Szczerbicki (2006, 2009).
منابع مشابه
Application of Decisional DNA in Web Data Mining
Web data mining techniques are becoming popular and valuable components of web data analysis systems. It assists website’s owners to estimate their website’s performance and make explicit and precise business strategies. The main features of Decisional DNA are related to knowledge representation structures. They are dealing with noisy and incomplete data, learning from experience, making precis...
متن کاملQuery Architecture Expansion in Web Using Fuzzy Multi Domain Ontology
Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملHigh Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences
Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...
متن کاملCUM: An Efficient Framework for Mining Concept Units
Web is the most important repository of different kinds of media such as text, sound, video, images etc. Web mining is the process of applying data mining techniques to automatically discover knowledge from such a diverse, sheer size data so that it can be more easily browsed, organized, and catalogued with minimal human intervention. A web site usually contains a large number of concept entiti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Cybernetics and Systems
دوره 43 شماره
صفحات -
تاریخ انتشار 2012